Last updated: 2025-06-25

Checks: 6 1

Knit directory: casper_ss_ma/analysis/

This reproducible R Markdown analysis was created with workflowr (version 1.7.1). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.


Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.

The command set.seed(12345) was run prior to running the code in the R Markdown file. Setting a seed ensures that any results that rely on randomness, e.g. subsampling or permutations, are reproducible.

Great job! Recording the operating system, R version, and package versions is critical for reproducibility.

Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.

Using absolute paths to the files within your workflowr project makes it difficult for you and others to run your code on a different machine. Change the absolute path(s) below to the suggested relative path(s) to make your code more reproducible.

absolute relative
/Volumes/scratch/DIMA/piva/casper_ss_ma/ ..

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.

The results in this page were generated with repository version 8bb180c. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .RData
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/

Untracked files:
    Untracked:  .DS_Store
    Untracked:  analysis/.DS_Store
    Untracked:  analysis/02_degs_go_aneuploidy_median.Rmd
    Untracked:  analysis/03_degs_go_CD82expr_median.Rmd
    Untracked:  analysis/VennDiagram.2025-06-09_13-53-40.335615.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-54-51.029086.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-55-15.147126.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-56-18.122749.log
    Untracked:  analysis/VennDiagram.2025-06-09_13-56-30.934079.log
    Untracked:  analysis/VennDiagram.2025-06-09_14-18-19.412377.log
    Untracked:  analysis/VennDiagram.2025-06-18_10-28-53.699452.log
    Untracked:  analysis/VennDiagram.2025-06-18_10-37-36.77178.log
    Untracked:  analysis/VennDiagram.2025-06-18_11-32-36.228427.log
    Untracked:  analysis/VennDiagram.2025-06-18_15-38-55.387683.log
    Untracked:  analysis/VennDiagram.2025-06-18_15-48-17.579371.log
    Untracked:  analysis/VennDiagram.2025-06-18_17-18-17.268774.log
    Untracked:  analysis/VennDiagram.2025-06-19_11-11-17.376961.log
    Untracked:  analysis/VennDiagram.2025-06-19_14-52-46.049026.log
    Untracked:  analysis/VennDiagram.2025-06-19_16-40-05.861139.log
    Untracked:  analysis/VennDiagram.2025-06-19_16-40-07.33202.log
    Untracked:  analysis/VennDiagram.2025-06-19_16-40-08.673023.log
    Untracked:  analysis/VennDiagram.2025-06-19_17-50-05.238063.log
    Untracked:  analysis/VennDiagram.2025-06-19_17-50-07.22979.log
    Untracked:  analysis/VennDiagram.2025-06-19_17-50-09.007028.log
    Untracked:  analysis/VennDiagram.2025-06-19_18-48-01.885712.log
    Untracked:  analysis/VennDiagram.2025-06-19_18-48-03.579702.log
    Untracked:  analysis/VennDiagram.2025-06-19_18-48-04.898695.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-18-23.300456.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-18-24.588109.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-18-26.077856.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-50-54.081682.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-50-55.516535.log
    Untracked:  analysis/VennDiagram.2025-06-20_10-50-56.913582.log
    Untracked:  analysis/VennDiagram.2025-06-20_11-10-43.68944.log
    Untracked:  analysis/VennDiagram.2025-06-20_11-10-45.681514.log
    Untracked:  analysis/VennDiagram.2025-06-20_11-10-47.126222.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-19-10.326514.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-19-11.75991.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-19-13.198666.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-29-09.447741.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-29-11.214146.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-29-12.791818.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-44-02.971891.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-44-04.709094.log
    Untracked:  analysis/VennDiagram.2025-06-20_12-44-06.321173.log
    Untracked:  analysis/VennDiagram.2025-06-24_15-54-45.065538.log
    Untracked:  analysis/VennDiagram.2025-06-24_15-54-48.303942.log
    Untracked:  analysis/VennDiagram.2025-06-24_15-54-50.098014.log
    Untracked:  analysis/VennDiagram.2025-06-25_11-53-49.958809.log
    Untracked:  analysis/VennDiagram.2025-06-25_11-53-51.64026.log
    Untracked:  analysis/VennDiagram.2025-06-25_11-53-53.29465.log
    Untracked:  analysis/VennDiagram.2025-06-25_15-09-09.87969.log
    Untracked:  analysis/VennDiagram.2025-06-25_15-09-14.193409.log
    Untracked:  analysis/VennDiagram.2025-06-25_15-09-17.485413.log
    Untracked:  analysis/VennDiagram.2025-06-25_15-34-29.722117.log
    Untracked:  analysis/VennDiagram.2025-06-25_15-34-31.791802.log
    Untracked:  analysis/VennDiagram.2025-06-25_15-34-34.21193.log
    Untracked:  analysis/hsa04064.HLT-HighAS_vs_HLT-LowAS.png
    Untracked:  analysis/hsa04064.HLT-HighCD82_vs_HLT-LowCD82.png
    Untracked:  analysis/hsa04064.HRplus-HighAS_vs_HRplus-LowAS.png
    Untracked:  analysis/hsa04064.HRplus-HighCD82_vs_HRplus-LowCD82.png
    Untracked:  analysis/hsa04064.HRplus_vs_HLT.png
    Untracked:  analysis/hsa04064.TNBC-HighAS_vs_TNBC-LowAS.png
    Untracked:  analysis/hsa04064.TNBC-HighCD82_vs_TNBC-LowCD82.png
    Untracked:  analysis/hsa04064.TNBC_vs_HLT.png
    Untracked:  analysis/hsa04064.TNBC_vs_HRplus.png
    Untracked:  analysis/hsa04064.png
    Untracked:  analysis/hsa04064.xml
    Untracked:  code/
    Untracked:  data/
    Untracked:  degs_HLT-HighAS_vs_HLT-LowAS.csv
    Untracked:  degs_HRplus-HighAS_vs_HRplus-LowAS.csv
    Untracked:  output/

Unstaged changes:
    Modified:   analysis/00_casper_analysis.Rmd
    Deleted:    analysis/02_deconvolution.Rmd
    Modified:   casper_ss_ma.Rproj

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.


These are the previous versions of the repository in which changes were made to the R Markdown (analysis/03_degs_go_CD82expr.Rmd) and HTML (docs/03_degs_go_CD82expr.html) files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view the files as they were in that past version.

File Version Author Date Message
Rmd 8bb180c annamariapiva 2025-06-25 updated notebooks 01, 02, 03 and 04
html a039b3f annamariapiva 2025-06-20 Build site.
Rmd f0e862c annamariapiva 2025-06-20 new reports

Introduction

The goal of this analysis is to identify which pathways are up- or down-regulated in samples with high or low levels of CD82 expression. For each condition (Healthy, HR+, and TNBC), patients are divided into high and low CD82 gene expression groups using the median expression value as a cutoff. The following comparisons:

  • HR+ High-CD82 vs HR+ Low-CD82

  • TNBC High-CD82 vs TNBC Low-CD82

  • Healthy High-CD82 vs Healthy Low-CD82

Overview of the analysis step

    1. Differential gene expression analysis with DESeq2
    1. Gene set enrichment analysis with ClusterProfiler
    1. Gene set enrichment analysis with fastGSEA with the curated Human MSigDB Collections. In particular the hallmark gene sets summarize and represent specific well-defined biological states or processes.
    1. Focus on NF-kB pathway. The R package pathview allows to visualize differentially expressed genes in the KEGG pathway NF-kB pathway

The input for the following analysis is:

  • counts matrix, normalized with variance stabilizing transformation (VST) normalization using Deseq2, where each row represents one sample and each column represents one gene, so each cell represents the expression level of a specific gene in a particular sample. VST aims at generating a matrix of values for which variance is constant across the range of mean values, especially for low mean;
  • samples info, including sample name, condition (HRplus, TNBC, Healthy) and batch (240919_rnaseq, 250501_rnaseq).
knitr::opts_chunk$set(echo = FALSE, message = FALSE, warning = FALSE)

Loading R packages and input data

The first steps to start the analysis in R is to load the packages required for the analysis, load the input data mentioned above and establish the thresholds for the analysis:

  • min_sample = 2, minimum number of samples where the gene needs to have at least 1 read;
  • logfc = log2(2) = 1, which represents the ratio between the expression level of a gene in the conditions considered, expressed in logarithmic scale (base 2); a positive log fold change for a gene, greater than 1, means that the expression of that specific gene is increased in group1 with respect to group2, by a multiplicative factor 2^logfc;
  • qvalue = 0.01, that can be interpreted as false positive rate, the proportion of false positives among all positive results, which means avoid to detect differential expression of a gene that is not differentially expressed. LogFC and qvalue thresholds have been selected based on commonly used thresholds.

CD82 Gene Expression Distribution

To classify samples into High and Low CD82 expression groups, we examined the distribution of CD82 expression across all samples from the three conditions.

In the distribution plots:

  • The blue line indicates the median CD82 expression of the displayed samples.

  • The red line indicate the median CD82 expression of all the samples (the cutoff used).

Differential expression analysis

Differential expression analysis is performed using a custom function, which accounts for batch effect. A batch effect occurs when non-biological factors, like laboratory conditions or instruments used, in an experiment cause changes in the data produced by the experiment. Lowly expressed genes are removed to reduce noise. Lowly expressed genes are here considered as:

  • genes having total number of reads less than half of the samples;
  • genes expressed in less samples than the number of conditions.

Number of samples per condition

Version Author Date
a039b3f annamariapiva 2025-06-20

PCA

Let’s have a look at PCA, and gene expression pattern across samples. The batch effect has been considered in the design, but has not been corrected for this plot.

Contrast 1: HRplus-HighCD82 vs HRplus-LowCD82

PCA of selected conditions

Here is the PCA of selected sample from the first comparison.

MA plot and volcano plot

Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.

Table of all differentially expressed genes

Heatmap for top 20 genes

Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.

Meaning of Colors

  • Red: Indicates high expression for that gene in a given sample (value above average, positive compared to the standardized scale).
  • Blue: Indicates low expression for that gene in a given sample (value below average, negative compared to the standardized scale).
  • White (or intermediate color): Indicates an expression close to the average (standardized value around 0).

Version Author Date
a039b3f annamariapiva 2025-06-20

Version Author Date
a039b3f annamariapiva 2025-06-20

Gene set enrichment analysis

Further analysis is done through gene set enrichment analysis, which does not exclude genes based on logfc or adjusted p-value, as done previously. GSEA is performed separately on each subontology: biological processes (BP), cellular components (CC) and molecular functions (MF). The dot plot below shows the top 10 most enriched GO terms. The size of each dot correlates with the count of differentially expressed genes associated with each GO term. Furthermore, the color of each dot reflects the significance of the enrichment of the respective GO term, highlighting its relative importance.

Biological Processes (BP)

Cellular Components (CC)

Molecular function (MF)

GSEA with Hallmark Pathways (msigdbr)

To identify biologically meaningful patterns of gene expression, we performed Gene Set Enrichment Analysis (GSEA) using the MSigDB Hallmark gene sets, which summarize well-defined biological states or processes. Genes were ranked by Log2 Fold Change. Significantly enriched pathways were identified based on normalized enrichment score (NES) and adjusted p-values (FDR) (p.adj < 0.05). Positively enriched pathways are upregulated in the first group, while negatively enriched pathways indicate suppression.

quartz_off_screen 
                2 

Pathway viewer: focus on NF-kB signaling pathway (KEGG id: hsa04064)

[1] "Note: 4434 of 14384 unique input IDs unmapped."
[1] "Note: 4434 of 14384 unique input IDs unmapped."
[1] "Note: 4434 of 14384 unique input IDs unmapped."

Contrast 2: TNBC-HighCD82 vs TNBC-LowCD82

PCA of selected conditions

Here is the PCA of selected sample from the second comparison.

MA plot and volcano plot

Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.

Heatmap for top 20 genes

Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.

Meaning of Colors

  • Red: Indicates high expression for that gene in a given sample (value above average, positive compared to the standardized scale).
  • Blue: Indicates low expression for that gene in a given sample (value below average, negative compared to the standardized scale).
  • White (or intermediate color): Indicates an expression close to the average (standardized value around 0).

Version Author Date
a039b3f annamariapiva 2025-06-20

Version Author Date
a039b3f annamariapiva 2025-06-20

Gene set enrichment analysis

Biological Processes (BP)

Cellular Components (CC)

Molecular function (MF)

GSEA msigdb

quartz_off_screen 
                2 

Contrast 3: Healthy-HighCD82 vs Healthy-LowCD82

PCA of selected conditions

Here is the PCA of selected sample from the third comparison.

MA plot and volcano plot

Genes are annotated as significant or not, to distinguish between genes showing meaningful changes, that is having an adjusted p-value below the threshold considered above and an absolute log2FoldChange greater than the cutoff considered above.

Heatmap for top 20 genes

Given the significant genes, among the differentially expressed genes previously computed, let’s visualize the top20 and all the DE genes.

Meaning of Colors

  • Red: Indicates high expression for that gene in a given sample (value above average, positive compared to the standardized scale).
  • Blue: Indicates low expression for that gene in a given sample (value below average, negative compared to the standardized scale).
  • White (or intermediate color): Indicates an expression close to the average (standardized value around 0).

Version Author Date
a039b3f annamariapiva 2025-06-20

Version Author Date
a039b3f annamariapiva 2025-06-20

Gene set enrichment analysis

Biological Processes (BP)

Cellular Components (CC)

Molecular function (MF)

GSEA msigdb

quartz_off_screen 
                2 

Common pathways

Biological Processes Pathways

Version Author Date
a039b3f annamariapiva 2025-06-20

Cellular Components Pathways

Version Author Date
a039b3f annamariapiva 2025-06-20

Molecular Functions Pathways

Version Author Date
a039b3f annamariapiva 2025-06-20

GSEA Heatmap Hallmarks - all comparisons

Version Author Date
a039b3f annamariapiva 2025-06-20

Table of all genes


R version 4.3.1 (2023-06-16)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.4.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Rome
tzcode source: internal

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] VennDiagram_1.7.3           futile.logger_1.4.3        
 [3] pathview_1.40.0             tibble_3.3.0               
 [5] fgsea_1.26.0                msigdbr_24.1.0             
 [7] gridExtra_2.3               dplyr_1.1.4                
 [9] clusterProfiler_4.8.2       plotly_4.10.4              
[11] reshape_0.8.9               ggplot2_3.5.2              
[13] gplots_3.2.0                RColorBrewer_1.1-3         
[15] ComplexHeatmap_2.16.0       rtracklayer_1.60.1         
[17] DESeq2_1.40.2               SummarizedExperiment_1.30.2
[19] Biobase_2.60.0              MatrixGenerics_1.12.3      
[21] matrixStats_1.5.0           GenomicRanges_1.52.1       
[23] GenomeInfoDb_1.36.4         IRanges_2.34.1             
[25] S4Vectors_0.38.2            BiocGenerics_0.46.0        
[27] DT_0.33                    

loaded via a namespace (and not attached):
  [1] splines_4.3.1            later_1.4.2              BiocIO_1.10.0           
  [4] bitops_1.0-9             ggplotify_0.1.2          polyclip_1.10-7         
  [7] graph_1.78.0             XML_3.99-0.18            lifecycle_1.0.4         
 [10] doParallel_1.0.17        rprojroot_2.0.4          lattice_0.22-7          
 [13] MASS_7.3-60              crosstalk_1.2.1          magrittr_2.0.3          
 [16] sass_0.4.10              rmarkdown_2.29           jquerylib_0.1.4         
 [19] yaml_2.3.10              httpuv_1.6.16            cowplot_1.1.3           
 [22] DBI_1.2.3                abind_1.4-8              zlibbioc_1.46.0         
 [25] purrr_1.0.4              ggraph_2.2.1             RCurl_1.98-1.17         
 [28] yulab.utils_0.2.0        tweenr_2.0.3             git2r_0.36.2            
 [31] circlize_0.4.16          GenomeInfoDbData_1.2.10  enrichplot_1.20.0       
 [34] ggrepel_0.9.6            tidytree_0.4.6           codetools_0.2-20        
 [37] DelayedArray_0.26.7      DOSE_3.26.2              ggforce_0.4.2           
 [40] tidyselect_1.2.1         shape_1.4.6.1            aplot_0.2.5             
 [43] farver_2.1.2             viridis_0.6.5            GenomicAlignments_1.36.0
 [46] jsonlite_2.0.0           GetoptLong_1.0.5         tidygraph_1.3.1         
 [49] iterators_1.0.14         foreach_1.5.2            tools_4.3.1             
 [52] treeio_1.24.3            Rcpp_1.0.14              glue_1.8.0              
 [55] xfun_0.52                qvalue_2.32.0            withr_3.0.2             
 [58] formatR_1.14             fastmap_1.2.0            caTools_1.18.3          
 [61] digest_0.6.37            R6_2.6.1                 gridGraphics_0.5-1      
 [64] colorspace_2.1-1         GO.db_3.17.0             gtools_3.9.5            
 [67] RSQLite_2.4.1            tidyr_1.3.1              generics_0.1.4          
 [70] data.table_1.17.6        graphlayouts_1.2.2       httr_1.4.7              
 [73] htmlwidgets_1.6.4        S4Arrays_1.0.6           scatterpie_0.2.4        
 [76] whisker_0.4.1            pkgconfig_2.0.3          gtable_0.3.6            
 [79] blob_1.2.4               workflowr_1.7.1          XVector_0.40.0          
 [82] shadowtext_0.1.4         htmltools_0.5.8.1        clue_0.3-66             
 [85] scales_1.4.0             png_0.1-8                ggfun_0.1.8             
 [88] lambda.r_1.2.4           knitr_1.50               rstudioapi_0.17.1       
 [91] reshape2_1.4.4           rjson_0.2.23             nlme_3.1-168            
 [94] curl_6.3.0               org.Hs.eg.db_3.17.0      cachem_1.1.0            
 [97] GlobalOptions_0.1.2      stringr_1.5.1            KernSmooth_2.23-26      
[100] parallel_4.3.1           HDO.db_0.99.1            AnnotationDbi_1.62.2    
[103] restfulr_0.0.15          pillar_1.10.2            vctrs_0.6.5             
[106] promises_1.3.3           cluster_2.1.8.1          Rgraphviz_2.44.0        
[109] evaluate_1.0.4           KEGGgraph_1.60.0         cli_3.6.5               
[112] locfit_1.5-9.12          compiler_4.3.1           futile.options_1.0.1    
[115] Rsamtools_2.16.0         rlang_1.1.6              crayon_1.5.3            
[118] labeling_0.4.3           plyr_1.8.9               fs_1.6.6                
[121] stringi_1.8.7            viridisLite_0.4.2        BiocParallel_1.34.2     
[124] assertthat_0.2.1         babelgene_22.9           Biostrings_2.68.1       
[127] lazyeval_0.2.2           GOSemSim_2.26.1          Matrix_1.6-4            
[130] patchwork_1.3.0          bit64_4.6.0-1            KEGGREST_1.40.1         
[133] igraph_2.1.4             memoise_2.0.1            bslib_0.9.0             
[136] ggtree_3.8.2             fastmatch_1.1-6          bit_4.6.0               
[139] downloader_0.4.1         ape_5.8-1                gson_0.1.0